Recent Developments in AI

Alan Feder

May 22, 2025

Who am I?

Alan Feder Headshot

  • Alan Feder
  • Staff LLM Data Scientist at Mantech
  • Columbia University
    • BA in Mathematics, MA in Statistics
  • 15 years of industry experience
    • Federal Government, Investments, FinTech, Insurance, Pharma, HealthTech, Cybersecurity

Where were we?

  • AI was primarily about chatbots
    • Question in, answer out
    • Lots of hallucinations
  • People didn’t pay much attention to anything but text
    • Maybe a little bit of image generation, focusing on how people had seven fingers

Chatbot Change #1 - Reasoning

  • Trained to break down complex problems into smaller steps
    • Intermediate reasoning steps before arriving at a final answer

Cheerios Example


Why does Reasoning matter?

  • Fewer math mistakes
  • Handles multi-step tasks like budgets or plans

When to use Reasoning?

Use Reasoning Models Use Traditional Models
• Complex, multi-step problems (math, logic, puzzles) • Simple text transformations (summaries, translation)
• Need to inspect “show your work” chains of thought Emails that sound personal
• Integrating external tools (APIs, code, calculators) • Latency- or cost-sensitive scenarios
• Multi-hop Q&A or very long context windows • Creative work

Chatbot Change #2 - Long Context

Previously some of these models were limited to 3000 words (~6 pages)

Now some models have 2m words (3x longer than War and Peace)


Why does long context matter?

Easier to put lots of information in (e.g. entire tax return) to ask questions

Chatbot Change #3 - Open Source

  • 🇨🇳 Open Source models, nearly as good as 🇺🇸
    • DeepSeek R1 Chatbot , and others
    • They (?) trained their model for less than $6m
    • Released model for free - became #1 on 🍎 App Store
    • NVIDIA market cap dropped more than $500b in one day

Why did Deepseek matter?

  • If you can get the compute yourself, you don’t need to rely on one of the big companies
  • Is this the end of 🇺🇸 AI dominance?

AI is more than just chatting and cheating on exams

Agents

What are Agents?

  • Autonomous helpers that break your goal into steps and act on them for you
  • Plan, use tools, and remember past steps to stay on track
  • Run on their own once given a clear objective

How AI/LLMs Power Agents

  • LLM “brain”
    Reads your request in plain text and generates step-by-step plans

  • Autonomous action
    Uses those plans to call tools (APIs, calculators, calendars) on its own

  • Built-in memory
    Remembers earlier steps so it can handle multi-step tasks without losing context

Examples of Agents

  • Manage your email inbox & calendar
  • Book travel for you
  • Plan a schedule

Risks of Agents

Accuracy is even more important

  • If the LLM actually matters, errors actually matter
  • If each function call has a 95% accuracy, but you chain together 20 of them, there is only a 36% percent chance it will have the final answer

Privacy/Safety is even more important

An Agent for Academics

Research agents have been built into ChatGPT, Claude, Gemini, and Grok

Example

NotebookLM - Podcast Generation

Example

How to Build Agents?

MCP - Model Context Protocol

  • “USB-C for AI”: a universal plug that connects AI models to tools and data
  • Lets AI agents fetch files, call APIs, or use apps—without custom code
  • Created by Anthropic, now supported by nearly all LLM providers

Why are people excited by MCP?

  • Chatbots with live data: Access real-time info from calendars, CRMs, or databases
  • Multi-tool workflows: Chain actions across apps (e.g., search → summarize → email)

Vibe Coding

What is Vibe Coding?

Examples

Multimodal

  • Many models and tools seamlessly integrate text, images, video, and audio

  • Advanced Voice Mode has improved its ability to have more seamless conversations

  • Video Generation with Sora and Veo

  • Text to Speech can have intonation

  • Speech to text can use specific names and jargon

What Now?

  • How will you be using it?
  • How can your students use it to learn?
  • How can your students use it to build?